Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or Staffmolly Date: Jan 25, 2005 12:38am
Forum: toronto Subject: Re: scanner credit?

>How would you like to be credited for the scans?

How about "Internet Archive/Canadian Libraries"


>Where shall I send the final product once post-processing has been completed, so you can include it with your collection?

Hmmm. I think we host a copy of everyting PG/DP does on our FTP server. If we just had an idea of which ones were which, we could take them ourselves. Perhaps an email once in a while with the Project Gutenberg identifier and the Internet Archive identifier?

For example, if you do Hygiene for Young People:
http://www.archive.org/details/OISEhygiene00miniuoft

You would send us an email with
OISEhygiene00miniuoft=981 (doesn't PG number their books?)

Is that easy? We would love to make this automatic in the future though. We really look forward to seeing books from the Canadian Libraries collection be OCR corrected. How exciting!

-Molly

Reply to this post
Reply [edit]

Poster: aronsson Date: Jan 25, 2005 2:41pm
Forum: toronto Subject: Re: scanner credit?

It might be useful to add to the metadata that the U.S. Library of Congress uses call number "QP37 .K75" (LCCN 09024348) and "QP37 .K76" (LCCN 09025953) for this title. These numbers can be searched at http://catalog.loc.gov/

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffmolly Date: Jan 26, 2005 3:24am
Forum: toronto Subject: Re: scanner credit?

We are querying the University of Toronto's catalogue to get the metadata these days. If we don't get a match, then we enter the info in by hand.

I see that LoC has two records for this book, which is cool. I wonder how we would go about dealing with slightly different information for different editions? Hmm!

-m

Reply to this post
Reply [edit]

Poster: aronsson Date: Jan 26, 2005 6:26am
Forum: toronto Subject: Re: scanner credit?

Hi Molly, I really appreciate your effort to answer all questions. But I don't know your background. Are you a librarian? Otherwise, you should use the librarians in Toronto for cataloging expertise. They should know all about things like editions, LoC, LCCN, OCLC, etc.

The LoC catalog (like all big catalogs) has a lot of errors in it, but they also have an online error reporting tool that I constantly use. We can all improve from feedback.

There is even some new thinking (FRBR) going on in the cataloging world, with people like Roy Tennant at the University of California main office in Oakland, which is just across the Bay from the Internet Archive at the Presidio. I'm a C/UNIX programmer but when I hang around on librarian mailing lists I get comments such as "Brewster Kahle is not a librarian!!!" and that seems so unnecessary, when the IA could easily cover its reputation with a little cooperation and openness.

Also, Michael Hart has started to call for librarians to cooperate with him to produce MARC records for all Project Gutenberg e-texts, and this should perhaps be concerted with the Million Book and Canadian Libraries projects at the Internet Archive.

By the way, when I receive reminders about this discussion, the e-mail contains the URL http://www.archive.org//details/toronto#forum with the double slash, that this website refuses to handle.

Reply to this post
Reply [edit]

Poster: pt Date: Jan 26, 2005 7:21am
Forum: toronto Subject: Re: scanner credit?

|Hi Molly, I really appreciate your effort to |answer all questions. But I don't know your |background. Are you a librarian? Otherwise, you |should use the librarians in Toronto for |cataloging expertise. They should know all about |things like editions, LoC, LCCN, OCLC, etc.

Hi, I'm Parker, an Archivist at the IA and I spent a six years in Library/Information schools (U.C. Berkeley and U. of Washington) in persuit of my librarian cred (resume available upon request). I've been working with Molly (the project manager) and librarians at the U. of Toronto on metadata for the Canadian Libraries collection.

We went with the UofT catalogue records because we could be sure that there would be a minimal number of problems matching records, and they provided us great APIs for interacting with the catalogue to get the kind of data we wanted/needed for our purposes, specifically Dublin Core and MARC XML records.

I agree it would be great to have other kinds of records as well associated with the books, so that anyone with these records could get at the content. However, we're a small organization and have to make choices about where we spend our resources. If you or someone you know is interested in helping us associate other types of records with these books please have them contact us (info@archive.org). This would be great.

|There is even some new thinking (FRBR) going on |in the cataloging world, with people like Roy |Tennant at the University of California main |office in Oakland, which is just across the Bay |from the Internet Archive at the Presidio. I'm a |C/UNIX programmer but when I hang around on |librarian mailing lists I get comments such as |"Brewster Kahle is not a librarian!!!" and that |seems so unnecessary, when the IA could easily |cover its reputation with a little cooperation |and openness.

Please remind these people that contrary to what the ALA might say, they do not own the word "librarian". Brewster founded and runs a (digital) library, where we archive and distribute books/music/video/software/data. In my mind it is that that makes him a librarian.

We're happy to work with librarians who have expertise, and freely admit we could use their advice and expertise. We work with librarians every day on various projects, but we're always glad to have more advice and help (please!). Again, we're small organization and there are limits to what we can do without help.

I'd be happy to hear suggestions about ways in which we could be more open to the library community. We don't have any secrets that I know of :). Next time somone says we are not open, please ask them to contact me (info@archive.org ask for Parker).

|Also, Michael Hart has started to call for |librarians to cooperate with him to produce MARC |records for all Project Gutenberg e-texts, and |this should perhaps be concerted with the Million |Book and Canadian Libraries projects at the |Internet Archive.

Where possible we fetch MARC XML records from the UofT catalogue. These are included with the scanned books and are available much more frequently for recently scanned books than for the early scans. To see them try:

http://www.archive.org/download/bookidentifier/bookidentifier_marc.xml

or click the 'All files' link on the side and look for it there. For books without MARC XML records I believe we'll be working with the folks at PG to figure out the easiest way to make our content work for them and visa versa.

|By the way, when I receive reminders about this |discussion, the e-mail contains the URL |http://www.archive.org//details/toronto#forum |with the double slash, that this website refuses |to handle.

Thanks for the report, I'll see what I can do about fixing this error.

pt.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffmolly Date: Jan 27, 2005 12:49am
Forum: toronto Subject: Re: scanner credit?

Actually, I'm not a librarian, I have an art degree. We are working very closely with the librarians in Toronto though, and one tends to pick up various acronyms for standards and metadata by hanging out with them. *grin*

We have everything to gain from being open and working with librarians, and we feel we are doing a pretty good job for an organization with 20 people in it. We pride ourselves on being open, being brutally honest, and developing and using open source software, so it's weird to hear that we have the opposite reputaion. We depend on the outside community for help though, so if you would like to help make the book collection better by assigning LoC catalogue numbers to everything, we would love your expertise! I'm molly at archive dot org.

-m

Reply to this post
Reply [edit]

Poster: aronsson Date: Jan 27, 2005 5:16pm
Forum: toronto Subject: Re: scanner credit?

I'm looking forward to our cooperation. As for openness: In August 2004 I wrote a message on the Million Books discussion (Subject: German texts), and it took until December before someone (you, actually) answered. In those three months I didn't know if the project was dead, if you were busy doing other things, or if nobody wanted to communicate with me. According to other messages in that forum, other people had the same impression. I'm glad that this has changed, and hope you and Parker will continue to be around. You are doing many exciting things here, but I also see a lot of room for improvement. I have a lot of experience in receiving (harsh and undue) criticism, so I'll try my best to be constructive.

I'm running Project Runeberg (runeberg.org) with Scandinavian e-texts since 1992 and digital facsimiles since 1998 with just a handful of unpaid core volunteers, so 20 people is a huge organization to me. You are free to reuse our contents, which is in the public domain. We don't claim or reserve any rights to the digitization. (However, we don't want the name "Project Runeberg" to be reused by others.) At the very bottom of the title page for each of our books there is a "download" link that helps you retrieve a ZIP archive of 600dpi TIFF G4 images. Today this ZIP archive doesn't contain any OCR text or metadata, but we should improve this in the future.

Places that I hang around include the Web4lib, Bookpeople, and Diglib mailing lists, and the D-Lib Magazine website.